76 research outputs found

    Bioinformatic approaches to identify genomic, proteomic and metabolomic biomarkers for the metabolic syndrome

    Get PDF
    Advances in technology have turned modern biology into a data-intensive enterprise. The advent of high-output technologies like Microarrays and Next-generation sequencing technologies has resulted in researchers grappling not just with huge volumes but also multiple types of data. While generation and storage of high-quality data are an important research focus, it is increasingly recognized that translating data into actionable information and insight is a critical research challenge. To infer reliable conclusions from the data, it is often necessary to integrate large amounts of heterogeneous data with different formats and semantics. Given the breadth and volume of data involved, this goal is best achieved through automated methods and tools for data integration and workflow management. This thesis presents automated strategies that combine bioinformatics and statistical methods to identify novel biomarkers in high-throughput OMICs datasets pertaining to the metabolic syndrome and to gain mechanistic insight into the underlying biological processes. An underlying theme in this thesis is data-driven approaches that generate plausible hypothesis which is followed by experimental verification.UBL - phd migration 201

    BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models

    Get PDF
    Background: Quantitative models of biochemical and cellular systems are used to answer a variety of questions in the biological sciences. The number of published quantitative models is growing steadily thanks to increasing interest in the use of models as well as the development of improved software systems and the availability of better, cheaper computer hardware. To maximise the benefits of this growing body of models, the field needs centralised model repositories that will encourage, facilitate and promote model dissemination and reuse. Ideally, the models stored in these repositories should be extensively tested and encoded in community-supported and standardised formats. In addition, the models and their components should be cross-referenced with other resources in order to allow their unambiguous identification. Description: BioModels Database http://www.ebi.ac.uk/biomodels/ is aimed at addressing exactly these needs. It is a freely-accessible online resource for storing, viewing, retrieving, and analysing published, peer-reviewed quantitative models of biochemical and cellular systems. The structure and behaviour of each simulation model distributed by BioModels Database are thoroughly checked; in addition, model elements are annotated with terms from controlled vocabularies as well as linked to relevant data resources. Models can be examined online or downloaded in various formats. Reaction network diagrams generated from the models are also available in several formats. BioModels Database also provides features such as online simulation and the extraction of components from large scale models into smaller submodels. Finally, the system provides a range of web services that external software systems can use to access up-to-date data from the database. Conclusions: BioModels Database has become a recognised reference resource for systems biology. It is being used by the community in a variety of ways; for example, it is used to benchmark different simulation systems, and to study the clustering of models based upon their annotations. Model deposition to the database today is advised by several publishers of scientific journals. The models in BioModels Database are freely distributed and reusable; the underlying software infrastructure is also available from SourceForge https://sourceforge.net/projects/biomodels/ under the GNU General Public License

    Structuring research methods and data with the research object model:genomics workflows as a case study

    Get PDF
    Background: One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e. g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. Results: We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as "which particular data was input to a particular workflow to test a particular hypothesis?", and "which particular conclusions were drawn from a particular workflow?". Conclusions: Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well. Availability: The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/r

    Insight in Genome-Wide Association of Metabolite Quantitative Traits by Exome Sequence Analyses

    Get PDF
    Metabolite quantitative traits carry great promise for epidemiological studies, and their genetic background has been addressed using Genome-Wide Association Studies (GWAS). Thus far, the role of less common variants has not been exhaustively studied. Here, we set out a GWAS for metabolite quantitative traits in serum, followed by exome sequence analysis to zoom in on putative causal variants in the associated genes. 1H Nuclear Magnetic Resonance (1H-NMR) spectroscopy experiments yielded successful quantification of 42 unique metabolites in 2,482 individuals from The Erasmus Rucphen Family (ERF) study. Heritability of metabolites were estimated by SOLAR. GWAS was performed by linear mixed models, using HapMap imputations. Based on physical vicinity and pathway analyses, candidate genes were screened for coding region variation using exome sequence data. Heritability estimates for metabolites ranged between 10% and 52%. GWAS replicated three known loci in the metabolome wide significance: CPS1 with glycine (P-value  = 1.27×10−32), PRODH with proline (P-value  = 1.11×10−19), SLC16A9 with carnitine level (P-value  = 4.81×10−14) and uncovered a novel association between DMGDH and dimethyl-glycine (P-value  = 1.65×10−19) level. In addition, we found three novel, suggestively significant loci: TNP1 with pyruvate (P-value  = 1.26×10−8), KCNJ16 with 3-hydroxybutyrate (P-value  = 1.65×10−8) and 2p12 locus with valine (P-value  = 3.49×10−8). Exome sequence analysis identified potentially causal coding and regulatory variants located in the genes CPS1, KCNJ2 and PRODH, and revealed allelic heterogeneity for CPS1 and PRODH. Combined GWAS and exome analyses of metabolites detected by high-resolution 1H-NMR is a robust approach to uncover metabolite quantitative trait loci (mQTL), and the likely causative variants in these loci. It is anticipated that insight in the genetics of intermediate phenotypes will provide additional insight into the genetics of complex traits

    Automated workflow-based exploitation of pathway databases provides new insights into genetic associations of metabolite profiles

    Get PDF
    Background: Genome-wide association studies (GWAS) have identified many common single nucleotide polymorphisms (SNPs) that associate with clinical phenotypes, but these SNPs usually explain just a small part of the heritability and have relatively modest effect sizes. In contrast, SNPs that associate with metabolite levels generally explain a higher percentage of the genetic variation and demonstrate larger effect sizes. Still, the discovery of SNPs associated with metabolite levels is challenging since testing all metabolites measured in typical metabolomics studies with all SNPs comes with a severe multiple testing penalty. We have developed an automated workflow approach that utilizes prior knowledge of biochemical pathways present in databases like KEGG and BioCyc to generate a smaller SNP set relevant to the metabolite. This paper explores the opportunities and challenges in the analysis of GWAS of metabolomic phenotypes and provides novel insights into the genetic basis of metabolic variation through the re-analysis of published GWAS datasets. Results: Re-analysis of the published GWAS dataset from Illig et al. (Nature Genetics, 2010) using a pathway-based workflow (http://www.myexperiment.org/packs/319.html), confirmed previously identified hits and identified a new locus of human metabolic individuality, associating Aldehyde dehydrogenase family1 L1 (ALDH1L1) with serine/glycine ratios in blood. Replication in an independent GWAS dataset of phospholipids (Demirkan et al., PLoS Genetics, 2012) identified two novel loci supported by additional literature evidence: GPAM (Glycerol-3 phosphate acyltransferase) and CBS (Cystathionine beta-synthase). In addition, the workflow approach provided novel insight into the affected pathways and relevance of some of these gene-metabolite pairs in disease development and progression. Conclusions: We demonstrate the utility of automated exploitation of background knowledge present in pathway databases for the analysis of GWAS datasets of metabolomic phenotypes. We report novel loci and potential biochemical mechanisms that contribute to our understanding of the genetic basis of metabolic variation and its relationship to disease development and progression

    Metabolomics reveals a link between homocysteine and lipid metabolism and leukocyte telomere length: the ENGAGE consortium

    Get PDF
    Telomere shortening has been associated with multiple age-related diseases such as cardiovascular disease, diabetes, and dementia. However, the biological mechanisms responsible for these associations remain largely unknown. In order to gain insight into the metabolic processes driving the association of leukocyte telomere length (LTL) with age-related diseases, we investigated the association between LTL and serum metabolite levels in 7,853 individuals from seven independent cohorts. LTL was determined by quantitative polymerase chain reaction and the levels of 131 serum metabolites were measured with mass spectrometry in biological samples from the same blood draw. With partial correlation analysis, we identified six metabolites that were significantly associated with LTL after adjustment for multiple testing: lysophosphatidylcholine acyl C17:0 (lysoPC a C17:0, p-value = 7.1 × 10−6), methionine (p-value = 9.2 × 10−5), tyrosine (p-value = 2.1 × 10−4), phosphatidylcholine diacyl C32:1 (PC aa C32:1, p-value = 2.4 × 10−4), hydroxypropionylcarnitine (C3-OH, p-value = 2.6 × 10−4), and phosphatidylcholine acyl-alkyl C38:4 (PC ae C38:4, p-value = 9.0 × 10−4). Pathway analysis showed that the three phosphatidylcholines and methionine are involved in homocysteine metabolism and we found supporting evidence for an association of lipid metabolism with LTL. In conclusion, we found longer LTL associated with higher levels of lysoPC a C17:0 and PC ae C38:4, and with lower levels of methionine, tyrosine, PC aa C32:1, and C3-OH. These metabolites have been implicated in inflammation, oxidative stress, homocysteine metabolism, and in cardiovascular disease and diabetes, two major drivers of morbidity and mortality

    SBML Level 3: an extensible format for the exchange and reuse of biological models

    Get PDF
    Systems biology has experienced dramatic growth in the number, size, and complexity of computational models. To reproduce simulation results and reuse models, researchers must exchange unambiguous model descriptions. We review the latest edition of the Systems Biology Markup Language (SBML), a format designed for this purpose. A community of modelers and software authors developed SBML Level 3 over the past decade. Its modular form consists of a core suited to representing reaction-based models and packages that extend the core with features suited to other model types including constraint-based models, reaction-diffusion models, logical network models, and rule-based models. The format leverages two decades of SBML and a rich software ecosystem that transformed how systems biologists build and interact with models. More recently, the rise of multiscale models of whole cells and organs, and new data sources such as single-cell measurements and live imaging, has precipitated new ways of integrating data with models. We provide our perspectives on the challenges presented by these developments and how SBML Level 3 provides the foundation needed to support this evolution

    Bioinformatic approaches to identify genomic, proteomic and metabolomic biomarkers for the metabolic syndrome

    No full text
    Advances in technology have turned modern biology into a data-intensive enterprise. The advent of high-output technologies like Microarrays and Next-generation sequencing technologies has resulted in researchers grappling not just with huge volumes but also multiple types of data. While generation and storage of high-quality data are an important research focus, it is increasingly recognized that translating data into actionable information and insight is a critical research challenge. To infer reliable conclusions from the data, it is often necessary to integrate large amounts of heterogeneous data with different formats and semantics. Given the breadth and volume of data involved, this goal is best achieved through automated methods and tools for data integration and workflow management. This thesis presents automated strategies that combine bioinformatics and statistical methods to identify novel biomarkers in high-throughput OMICs datasets pertaining to the metabolic syndrome and to gain mechanistic insight into the underlying biological processes. An underlying theme in this thesis is data-driven approaches that generate plausible hypothesis which is followed by experimental verification
    corecore